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Abstract. We give a forbidden pattern characterization for the class of 
generahzed definite languages, show that the corresponding problem is 
NL-complete and can be solved in quadratic time. We also show that 
their syntactic complexity coincides with that of the definite languages 
and give an upper bound of n\ for this measure. 
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1 Introduction 



A language is generalized definite if membership can be decided for a word by 
looking at its prefix and suffix of a given constant length. Generalized definite 
languages and automata were introduced by Ginzburg [F in 1966 and further 
studied in e.g. |4I5I13I15] . This language class is strictly contained within the 
class of star- free languages, lying on the first level of the dot-depth hierarchy [1]. 
This class possess a characterization in terms of its syntactic semigroup [12]: 
a regular language is generalized definite if and only if its syntactic semigroup 
is locally trivial if and only if it satisfies a certain identity x'^yx'^ — x". This 
characterization is hardly efficient by itself when the language is given by its 
minimal automaton, since the syntactic semigroup can be much larger than the 
automaton (a construction for a definite language with state complexity - that 
is, the number of states of its minimal automaton - n and syntactic complexity - 
that is, the size of the transition semigroup of its minimal automaton- [e(n— 1)!J 
is explicit in [2J. However, as stated in [T3], Sec. 5.4, it is usually not necessary to 
compute the (ordered) syntactic semigroup but most of the time one can develop 
a more efficient algorithm by analyzing the minimal automaton. As an example 
for this line of research, recently, the authors of [9j gave a nice characterization 
of minimal automata of piecewise testable languages, yielding a quadratic-time 
decision algorithm, matching an alternative (but of course equivalent) earlier 
(also quadratic) characterization of [17^ which improved the 0{n?) bound of [15]. 

In this paper we give a forbidden pattern characterization for generalized definite 
languages in terms of the minimal automaton, and analyze the complexity of the 
decision problem whether a given automaton recognizes a generalized definite 
language, yielding an NL-completeness result (with respect to logspace reduc- 
tions) as well as a deterministic decision procedure running in 0{'n?) time (on a 
RAM machine). 
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There is an ongoing line of research for syntactic complexity of regular languages. 
In general, a regular language with state complexity n can have a syntactic 
complexity of n", already in the case when there are only three input letters. 
There are at least two possible modifications of the problem: one option is to 
consider the case when the input alphabet is binary (e.g. as done in |7I10| ). The 
second option is to study a strict subclass of regular languages. In this case, the 
syntactic complexity of a class C of languages is a function n i— > /(?i), with /(n) 
being the maximal syntactic complexity a member of C can have whose state 
complexity is (at most) n. The syntactic complexity of several language classes, 
e.g. (co)finite, reverse definite, bifix-, factor- and subword-free languages etc. 
is precisely determined in [11 . However, the exact syntactic complexity of the 
(generalized) definite languages and that of the star-free languages (as well as 
the locally testable or the locally threshold testable languages) is not known yet. 

We also address this problem and show that the syntactic complexity of gener- 
alized definite languages coincides with that of definite languages, and show an 
upper bound n\ for this measure. Since the lower bound is C2{{n — 1)!), this is 
asymptotically optimal up to a logarithmic factor. 



2 Notation 



We assume the reader is familiar with the standard notions of automata and 
language theory, but still we give a summary for the notation. 

When n > is an integer, [n] stands for the set {!,..., n}. An alphabet is a 
nonempty finite set S. The set of words over S is denoted S* , while Z""*" stands 
for the set of nonempty words. The empty word is denoted e. A language over 
S is an arbitrary set i C Z"* of Z'-words. 

A (finite) automaton (over S) is a system A — (Q, S,S,qQ, F) where Q is the 
finite set of states, qa G Q is the start state, F <Z Q is the set of final (or accepting) 
states, and 6 : Q x S ^ Q is the transition function. The transition function S 
extends in a unique way to a right action of the monoid S* on Q, also denoted S 
for ease of notation. When S is understood, we write q-u, or simply qu for S{q, u). 
Moreover, when C C Q is a subset of states and u G Z'* is a word, let Cu stand 
for the set {pu : p G C} and when L is a language, CL — {pu : p G C,u € L}. 
The language recognized by A is L(A) = {x G S* : qox e F}. A language is 
regular if it can be recognized by some finite automaton. 

The state q G Q is reachable from a state p G Q in A, denoted p ^a 9, or just 
P di q if there is no danger of confusion, iipu ^ q for some u G S* . An automaton 
is connected if its states are all reachable from its start state. 

Two states p and q of A are distinguishable if there exists a word u G S* such 
that exactly one of pu and qu belongs to F. In this case we say that u separates 
p and q. A connected automaton is called reduced if each pair of distinct states 
is distinguishable. 



It is known that for each regular language L there exists a reduced automaton 
Al, unique up to isomorphism, recognizing L. A^ can be computed from any 
automaton recognizing L by an efficient algorithm called minimization and is 
called the minimal automaton of L. 

The classes of the equivalence relation p ^ q -i^ p ^ q and q di P are called 
components of A. A component C is trivial if C = {p} for some state p such that 
pa ^ p for any a S X", and is a sink if CS C C . It is clear that each automaton 
has at least one sink and sinks are never trivial. The component graph r{A) of 
A is an edge-labelled directed graph {V, E, £) along with a mapping c : Q -^ V 
where V is the set of the ^-classes of A, the mapping c associates to each state 
q its class q/ ^~ {p '■ p ^ q} and for two classes p/ ~ and q/ ~ there exists 
an edge from p/ ^ to q/ ~ labelled by a G Z" if and only if p'a = q' for some 
p' ~ p, 9' ~ q- It is known that the component graph can be constructed from A 
in linear time. Note that the mapping c is redundant but it gives a possibility for 
determining whether p ^ q holds in constant time on a RAM machine, provided 
Q = [n] for some n > and c is stored as an array. 

When A and B are sets, then A^ denotes the set of all functions f : B ^^ A. 
When f : B ^ A and C C B, then f\c '■ C ^ A denotes the restriction of 
/ to C. When Ai, . . . ,An are disjoint sets, A is a set and for each i G [n], 
f i : Ai ^ A is a function, then the source tupling of /i, . . . , /„ is the function 
[/i, . . . , /„] : ( U Ai) ^ A with [/i, . . . , /„](a) = fi{a) for the unique i with 

ie[n] 

a G Ai- Members of Q"^ are called transformations of Q, forming a semigroup 
with composition {,fg){q) — g{.f{q)) as product. When A — {Q, X!,6,qo,F) is 
an automaton, its transformation semigroup ^(A) consists of the set of trans- 
formations of Q induced by nonempty words, i.e. T(A) = {u^ : u G -S"^} 
where u^ : Q -^ Q \s the transformation defined as g i— > qu. A transforma- 
tion f : Q ^ Q is called permutational if there exists a set D C Q with |I?| > 1 
on which / induces a permutation, otherwise it's non-permutational. Observe 
that a non-permutational transformation / is idempotent (i.e. // = /) if and 
only if it is a constant function. Alternatively, a transformation / : Q — > Q is 
non-permutational for a finite Q if and only if /''^l is constant. Another class 
of functions used in the paper is that of the elevating functions: for the integers 
< fc < n, a function / : [k] — !> [n] is elevating if i < f{i) for each i G [k]. 
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34 3 Patterns for subclasses of the star-free languages 



A language L is 



— cofinite if its complement is finite; 

— definite if there exists a constant fc > such that for any x E S* , y E S^ 
we have xy <E L <^ y E L; 

— reverse definite if there exists a constant fc > such that for any x G S^ , 
y E S* we have xy € L <^ x e L; 



1 — generalized definite if there exists a constant k > such that for any xi , X2 G 

2 S'' and y G Z"* we have Xiyx2 G L 'i^ xiX2 (z L. 

3 These are all subclasses of the star-free languages, i.e. can be built from the 

4 singletons with repeated use of the concatenation, finite union and complemen- 

5 tation operations. It is known that the following decision problem is complete 

6 for PSPACE: given a regular language L with its minimal automaton, is L 

7 star-free? In contrast, the question for these subclasses above are all tractable. 

8 Minimal automata of the finite, cofinite, definite and reverse definite languages 

9 possess a characterization in terms oi forbidden patterns. In our setting, a pattern 

10 is an edge-labelled, directed graph P = {V,E,£), where V is the set of vertices, 

11 E C V'^ is the set of edges, and £ : E ^ X is a. labelling function which 

12 assigns to each edge a variable. An automaton A = (Q, S,6,qQ, F) admits a 
admitting/avoiding 13 pattern P = (V, E, £) if there exists an injective mapping f '.V ^f Q and a map 
a pattern u h : X ^^ Z'+ such that for each (u, v) G E labelled x we have f{u) ■ h{x) — f{v). 

15 Otherwise A avoids P. 

16 As an example, consider the pattern Pf on Figure [TJ 







P ) [ <1 ] [ P ] { 1 I \ ^ I y 

(a) Pattern P/. (b) Pattern Pd. (c) Pattern P,.. 

Fig. 1: Patterns for (co)finite, definite and reverse definite languages. 



An automaton admits Pf iff there exist different states p,q Cz Q and (not neces- 
sarily different) words x,y E i7+ such that px — p and qy — q. It is easy to see 
that an automaton A avoids Pf iff it has a unique sink which is a set consisting 
of a single state p, and all the other components are trivial; if p is a rejecting 
state, then L{A) is finite, otherwise it is cofinite. The condition is also necessary 
in the following sense: a language is finite or cofinite if and only if its minimal 
automaton avoids Pf. 

As other examples, consider the patterns Pd and Pr on Figure [TJ 

It is easy to see that if A = (Q, IJ,S,qQ, F) is the minimal automaton of a 
reverse definite language, then it avoids Pr'. if there are states p ^ q G Q and 
words x,y G S^ with px = p and py = q, then L = L(A) is not reverse definite. 
Indeed, suppose L is a fc-reverse definite language and let m be a word with 
qou — p. Since p ^ q and A is minimal, there is a word w distinguishing p and 
q. Thus, ux'^w and ux^yw are two words with the same prefix of length fc, and 
exactly one of them is in L, a contradiction. 



1 Also, if L = i(A) is a fc-definite language with A being its minimal automaton, 

2 then A avoids Pd'. if there are states p =/= q E Q and a word x with px — p, qx — q, 

3 then let u,v,w £ 17* be words such that qou — p, q^v — q and w separates p 

4 and q. Then ux^w and vx^w have the same suffix of length k, with exactly one 

5 of them being a member of i, a contradiction. 

6 It can be seen (see e.g. [5]) that avoiding these patterns are also sufficient: a 

7 regular language is definite (reverse definite, resp.) if and only if its minimal 

8 automaton avoids Pa {Pr, resp.). Note that avoiding Pd is equivalent to state 

9 that each nonempty word induces a transformation with at most one fixed point, 

10 which is further equivalent to state that each nonempty word induces a non- 

11 permutational transformation. See |11j.) 

12 Consequently, all the following questions are in the complexity class NL: given a 

13 language L by its minimal automaton, is L (co)finitc / definite / reverse definite? 

14 4 Results 

15 In this section we give a new characterization of the minimal automata of gen- 
ie eralized definite languages, leading to an NL-completeness result of the cor- 

17 responding decision problem, as well as a low-degree polynomial deterministic 

18 algorithm, and show that the syntactic complexity of generalized definite lan- 

19 guages is the same as that of the definite languages. We also give an upper bound 

20 n\ for the syntactic complexity of (generalized) definite languages. 

21 4.1 Forbidden pattern characterization 

22 We need the following well-known lemma: 

23 Lemma 1. For any nonempty finite set C there exists a constant m — m(\C\) 

24 depending only on the size of C such that in any product f — /1/2 . . ■ fm with 

25 fi G C^ for each i e [m], an idempotent factor appears, i.e. fj...fk is an 

26 idempotent transformation of C for some 1 < j < k < m. 

27 Note to the reviewers: we were unable to locate the first appearance with proof 

28 of Lemma [1] thus we decided to include its proof in the Appendix. 

29 We are ready to show that a regular language is generalized definite if and only 

30 if its minimal automaton avoids the pattern Pg, depicted on Figure [21 

31 Theorem 1. The following are equivalent for a reduced automaton A; 

32 i) A avoids Pg. 



^ Since - up to our knowledge - [5] has not been published yet in a peer-reviewed 
journal or conference proceedings, we include a proof of this fact. Nevertheless, we 
do not claim this result to be ours, by any means. 




Fig. 2: Forbidden pattern Pg for the generalized definite languages. 



ii) Each nontrivial component of A is a sink, and for each nonempty word u 
and sink C of A, the transformation u\c : C ^>- C is non-permutational. 

Hi) A recognizes a generalized definite language. 



Proof. Let A = (Q, S, S, qo, F) be a reduced automaton. 

i)— j-ii). Suppose A avoids Pg. Suppose that u\c is permutational for some sink 
C and word u G S~^. Then there exists a set D C C with |D| > 1 such that 
u induces a permutation on D. Then, x = u^-^^' is the identity on D. Choosing 
arbitrary distinct states p,q G D and a word y with py = q (such y exists since p 
and q are in the same component of A) , we get that A admits Pg by the {p, q,x,y) 
defined above, a contradiction. Hence, u\c is non-permutational for each sink C 
and word u G S^ . 

Now assume there exists a nontrivial component C which is not a sink. Then, 
pu = p for some p d C and word u G S~^ . Since C is not a sink, there exists 
a sink C ^ C reachable from p (i.e. all of its members are reachable from p). 
Since u induces a non-permutational transformation on C", x — u'^' ' induces a 
constant function on C". Let q be the unique state in the image of x\c'. Since 
C" is reachable from p, there exists some nonempty word y such that py — q. 
Hence, px — p, qx — q, py = q and A admits Pg, a contradiction. 

ii)— !>iii). Suppose the condition of ii) holds. We show that i(A) is generalized 
definite. Let n = m(\Q\) be the value defined in Lemma [T] Let x = xiyx2 
with a;i,a;2 G -S'", y £ S* . It suffices to show that qoXiyx2 — qaXiX2. Since 
\xi\ > IQI, some state p is visited at least twice on the path determined by xi. 
Hence p belongs to a nontrivial component C of A, which has to be a sink by 
the assumption of ii). Thus, qoxi G C and qoXiy G C as well. By Lemma [I] X2 
can be written as X2 = 2:2. 12^2, 22^2, 3 with X2.2 inducing an idempotent function 
on C. Since the function induced by X2,2 is also non-permutational on C, it is 
a constant function on C, hence X2 induces a constant function as well. Thus 
px2 = pyx2 and hence qoXiyx2 = qoXiX2. 

iii)^>i). Suppose L{A) is fc-generalized definite for some fc > and that A admits 
Pg, i.e. px = p, qx = q and py = q for some distinct states p, q and nonempty 
words X, y. Since A is reduced, p — q^u for some u G S* , and there exists a word 
w distinguishing p and q. Considering the words ux^x^w and ux^yx^w we get 
that they have the same prefix and suffix of length k, but exactly one of them 
is a member of L{A), a contradiction. D 



1 4.2 Complexity issues 

2 Using the characterization given in Theorem [Jl we study the complexity of the 

3 following decision problem GenDef: given a finite automaton A, is L{A) a gen- 

4 eralized definite language? 

5 Theorem 2. Problem GenDef is NL-comp/ete. 

6 Proof. First we show that GenDef belongs to NL. By |3], minimizing a DFA 

7 can be done in nondeterministic logspace. Thus we can assume that the input 

8 is already minimized, since the class of (nondeterministic) logspace computable 

9 functions is closed under composition. 

10 Consider the following algorithm: 

11 1. Guess two different states p and q. 

12 2. Let s :— p. 

13 3. Guess a letter a ^ S. Let s := sa. 

14 4. 1 f s = q, proceed to Step 5. Otherwise go back to Step 3. 

15 5. Let p' :— p and q' := q. 

16 6. Guess a letter a E S. Let p' :— p' a and g' — q'a. 

17 7. li p — p' and q = q' , accept the input. Otherwise go back to Step 6. 

18 The above algorithm checks whether A admits Pg: first it guesses p ^ q., then 

19 in Steps 2-4 it checks whether q is accessible from p, and if so, then in Steps 

20 5-7 it checks whether there exists a word x G S^ with px = p and qx — q. 

21 Thus it decideqjthe complement of GenDef, in nondeterministic logspace; since 

22 NL == coNL, we get that GenDef g NL as weU. 

23 For NL-completeness we recall from [5] that the reachability problem for DAGs 

24 (DAG-Reach) is complete for NL: given a directed acyclic graph G = (V, E) 

25 on V^ = [n] with {i,j) S E only if i < j, is n accessible from 1? We give a 

26 logspace reduction from DAG-Reach to GenDef as follows. Let G — {[n],E) 

27 be an instance of DAG-Reach. For a vertex i G [n], let N{i) — {j : {i,j) G E} 

28 stand for the set of its neighbours and let d{i) — \N{i)\ < n denote the outdegree 

29 of i. When j G [d(i)], then the jth neighbour of i, denoted n{i,j) is simply the 

30 jth element of N(i) (with respect to the usual ordering of integers of course). 

31 Note that for any i G [n] and j G [d{i)] both d{i) and the n{i,j) (if exists) can 

32 be computed in logspace. 

33 We define the automaton A— {[n + l],[n],S,l, {n + I}) where 

n + 1 if (i = n + 1) or [j ^ n) or {i < n and d{i) < j); 
^{hj) — ■^ 1 ii i — n and j < n; 

n{i,j) otherwise. 



^ Note that in this form, the algorithm can enter an infinite loop which fits into the 
definition of nondeterministic logspace. Introducing a counter and allowing at most 
n steps in the first cycle and at most n^ in the second we get a nondeterministic 
algorithm using logspace and polytime, as usual. 
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Note that A is indeed an automaton, i.e. S{i,j) is well-defined for each i,j. 

We claim that A admits Pg if and only if n is reachable from 1 in G. Observe 
that the underlying graph of A is G, with a new edge (n, 1) and with a new 
vertex n + 1, which is a neighbour of each vertex. Hence, {n + 1} is a sink of A 
which is reachable from all other states. Thus A admits Pg if and only if there 
exists a nontrivial component of A which is different from {n + 1}. Since in G 
there are no cycles, such component exists if and only if the addition of the edge 
(n, 1) introduces a cycle, which happens exactly in the case when n is reachable 
from 1. Note that it is exactly the case when Ix = 1 for some word x E S'^. 

What remains is to show that the reduced form B of A admits Pg if and only 
if A does. First, both 1 and n + 1 are in the connected part A' of A, and are 
distinguishable by the empty word (since n + 1 is final and 1 is not). Thus, if A 
admits Pg with la; = 1 and {n+ l)x = n+1 for some x £ U~^, then B admits Pg 
with h{l)x = h{l) and h(n + l)x — h{n + 1) (with h being the homomorphism 
from the connected part of A onto its reduced form). For the other direction, 
assume h{p)xQ = h{p) for some state p =/= n + 1 (note that since n + 1 is the 
only final state, p 7^ n + 1 if and only if h{p) ^ h{n + 1)). Let us define the 
sequence pojPij ... of states of A as po = P, Pt+i = Pt^o- Then, for each i > 0, 
h{pi) — h{p), thus Pi G [n]. Thus, there exist indices < i < j with pi — pj, 
yielding Pixf^"^ = pi, thus A admits Pg with p = pi^ q — n + \, x — x-j^^ and 

Hence, the above construction is indeed a logspace reduction from DAG-Reach 
to the complement of GenDef, showing NL-hardness of the latter; applying 
NL = coNL again, we get NL-hardness of GenDef itself. D 

It is worth observing that the same construction also shows NL-hardness (thus 
completeness) of the problem whether the input automaton accepts a definite 
language. 

Thus, the complexity of the problem is characterized from the theoretic point 
of view. However, nondeterministic algorithms are not that useful in practice. 
Since NL C P, the problem is solvable in polynomial time - now we give an 
efficient (quadratic) deterministic decision algorithm: 

1. Compute A' = (Q, Z", 5, q^, F), the reduced form of the input automaton A. 

2. Compute r{A'), the component graph of A'. 

3. If there exists a nontrivial, non-sink component, reject the input. 

4. Compute B = A' x A' and r{B). 

5. Check whether there exist a state {p,q) of B in a nontrivial component (of 
B) for some p ^ q with p being in the same sink as q in A. If so, reject the 
input; otherwise accept it. 

The correctness of the algorithm is straightforward by Theorem [TJ after mini- 
mization (which takes 0{nlogn) time) one computes the component graph of 
the reduced automaton (taking linear time) and checks whether there exists a 



nontrivial component which is not a sink (taking hnear time again, since we 
aheady have the component graph). If so, then the answer is NO. Otherwise one 
has to check whether there is a (sink) component C and a word x £ 17"*" such that 
fx\c has at least two different fixed points. Now it is equivalent to ask whether 
there is a state (p, q) in A' x A' with p and q being in the same component and 
a word x e S'^ with {p,q)x — (p^q). This is further equivalent to ask whether 
there is a (p, q) with p, q being in the same sink such that (p, q) is in a nontrivial 
component of B. Computing B and its components takes 0{n^) time, and (since 
we still have the component graph of A) checking this condition takes constant 
time for each state {p, q) of B, the algorithm consumes a total of 0{n^) time. 

Hence we have an upper bound concluding this subsection: 

Theorem 3. Problem GenDef can be solved in Oiri'^) deterministic time in 
the RAM model of computation. 

4.3 Syntactic complexity 

The syntactic complexity of a language is the size of its syntactic semigroup, the 
latter being isomorphic to the transformation semigroup T(A) of the minimal 
automaton A of the language (equipped with function composition as product). 
The syntactic complexity of a class C of regular languages is a function n i-^- f{n) 
where f{n) is the maximal syntactic complexity a member of C can have whose 
minimal automaton has at most n states. 

In [5] it has been shown that the class of definite languages has syntactic com- 
plexity > [e • (n — 1)!J, thus the same lower bound also applies for the larger 
class of generalized definite languages. 

Theorem 4. The syntactic complexity of the definite and that of the generalized 
definite languages coincide. 

Proof. It suffices to construct for an arbitrary reduced automaton A — [Q, S, S, go, F) 
recognizing a generalized definite language a reduced automaton B = {Q,A,S',qo,F') 
for some A recognizing a definite language such that |T(A)| < |T(B)|. 

By Theorem [Tl if L(A) is generalized definite and A is reduced, then Q can be 
partitioned as a disjoint union Q = Qo W Qi W . . . W Qc for some c > such that 
each Qi with i G [c] is a sink of A and Qo is the (possibly empty) set of those 
states that belong to a trivial component. Without loss of generality we can 
assume that Q ~ [n] and Qo — [k] for some n and k, and that for each i G [k] 
and a ^ S, i < ia. The latter condition is due to the fact that reachability 
restricted to the set Qo of states in trivial components is a partial ordering of 
Qo which can be extended to a linear ordering. Clearly, if Qo is nonempty, then 
by connectedness qo = I has to hold; otherwise c = 1 and we again may assume 
qo = 1. Also, QiS C Qi for each i e [c], and let \Qi\ < \Q2\ < ■ . ■ < \Qc\- 

Then, each transformation f '■ Q ^ Q can be uniquely written as the source 
tupling [/o, . . . , fc] of some functions fi : Qi ^ Q with fi : Qi ^ Qi for < z < c. 



1 For any [/o, . . . , /c] G T = T(A) the following hold: /o(J) > i for each i e [fc], 

2 and /j is non-permutational on Qj for each j g [c]. For k — 0,...,c, let 7fc 

3 stand for the set {fk ■ f £ T} (i.e. the set of functions /|qj. with / e T). Then, 

* ir|< n ir.i. 

0<fc<c 

5 If IQd = 1, then all the sinks of A are singleton sets. Thus there are at most 

6 two sinks, since if C and D are singleton sinks whose members do not differ in 

7 their finality, then their members are not distinguishable, thus C = D since A is 

8 reduced. Such automata recognize reverse definite languages, having a syntactic 

9 semigroup of size at most (n ~ 1)! by [2j, thus in that case B can be chosen to an 

10 arbitrary definite automaton having n state and a syntactic semigroup of size 

11 at least [e{n — 1)!J (by the construction in [2^, such an automaton exists). Thus 

12 we may assume that \Qc\ > 1. (Note that in that case Qc contains at least one 

13 final and at least one non-final state.) 

14 Let us define the sets 7^' of functions Qi ^ Q a.s Tq is the set of all elevating 

15 functions from [k] to [n], T^ = Tc and for each < fc < c, 7^' = Q^*". Since 

16 Tfc C Q^*' and \Qk\ < \Qc\ for each k e [c], we have \Tk\ < |7^'| for each 

17 < fc < c. Thus defining V = {[/o, ■ ■ ■ , fc] ■- h e T-] it holds that \T\ < \r'\. 

18 We define B as {Q,T' ,S' ,qo,F) with S'iqJ) = /(<?) for each / e V . We show 

19 that B is a reduced automaton avoiding P^, concluding the proof. 

20 First, observe that B has exactly one sink, Qc, and all the other states belong to 

21 trivial components (since by each transition, each member of Qo gets elevated, 

22 and each member of Qi with < i < c is taken into Qc)- Hence if B admits 

23 Pd, then pt = p and qt = q for some distinct pair p,q G Qc of states and 

24 t — [Lq, . . . , t'c\ G T' ■ This is further equivalent to pt'c — p and qt'c = q for some 

25 p ^ q in Qc and t'c G 7^'. By definition of 7^' = 7^, there exists a transformation 

26 of the form t = [to, ... , tc-i, i^] G 7" induced by some word x, thus px = p and 

27 qx ^ q both hold in A, and since p, q are in the same sink, there also exists a 

28 word y with py = q. Hence A admits Pg, a contradiction. 

29 Second, B is connected. To see this, observe that each state p j^ 1 is reachable 

30 from 1 by any transformation of the form t = [fp,ti, . . . ,tc] where fp : [k] — > [n] 

31 is the elevating function with 1/p = p and ifp — n for each i > 1. Of course 1 is 

32 also trivially reachable from itself, thus B is connected. 

33 Also, whenever p ^ q are different states of B, then they are distinguishable 

34 by some word. To see this, we first show this for p,q G Qc- Indeed, since A is 

35 reduced, some transformation t — [Iq, . . . ,tc\ (z T separates p and q (exactly one 

36 of pi = ptc and qt — qtc belong to F). Since Tc — Tc,we get that p and q are also 

37 distinguishable by in B by any transformation of the form t' — [tg, . . . , t'c_i,tc] G 

38 T'. Now suppose neither p nor q belong to Qc- Then, since {[if,, . . . ,t'c_i] : t[ G 

39 T/} = Qc , and \Qc\ > 1, there exists some t = [t'^, . . . ,t'^_-^] with pt ^ qt, 

40 thus any transformation of the form [t^, . . . ,t'c_i,tc] G T' maps p and q to 

41 distinct elements of Qc, which are already known to be distinguishable, thus so 

42 are p and q. Finally, if p G Qc and q ^ Qc, then let ic G 7^ be arbitrary and 
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1 t' — [tg, . . . ,tc-i] G Qc with qt' ^ ptc- Then [f ,t^ agam maps p and q to 

2 distinct states of Qc- 

3 Thus B is reduced, concluding the proof: B is a reduced automaton recognizing 



4 a definite language and having a syntactic semigroup T' with |T'| > |T|. □ 



5 4.4 Upper bound for syntactic complexity 

6 By [2 we know a lower bound [e(n — 1)!J for the syntactic complexity of the 

7 definite languages (thus, of the generalized definite ones as well). In this subsec- 

8 tion we give an upper bound n!, showing that the bound of [2] is asymptotically 

9 optimal up to a logarithmic factor (since n = 0(logn!)). 

10 Let A — {Q, S, S, qo, F) be a reduced automaton recognizing a definite language 

11 L and let T C Q'^ be its syntactic semigroup. Then, each member t of T is non- 
12 permutational and has a unique fixed point fix(i). For each p G Q, let Tp stand 

13 for the subset {t G T : fix(i) = p} of 7": then, T is the disjoint union of the sets 

14 Tp. Observe that 7^ is a semigroup for eachp, since whenever fix(t) = fix(t') — p, 

15 then ptt' = p, thus p is a fixed point of tt' (and by assumption, the superset T 

16 of 7^ is a semigroup consisting only non-permutational transformations) . Thus 

17 tt' G Tp as well. 

18 Lemma 2. For each p G Q, \Tp\ < {n ~ 1)1. 

19 Proof. Let Gp = (Q, E,£) be the edge-labelled graph on the set Q of vertices in 

20 which ((71,92) is an edge labelled by t e 7^ if and only if qii = (72 and gi ^ p. 

21 Then Gp is acyclic. 

22 Indeed, suppose qi -^- q2 -^ ■ . ■ -^ qk+i ~ q\- Then q\t\ti...t}^ — gi, thus q\ is 

23 a fixed point of i = t\ . . .t^ G Tp. Since in Gp the vertex p has outdegree 0, 

24 9o 7^ P) hence t has at least two distinct fixed points, a contradiction. Hence Gp 

25 is acyclic. Thus, there exists an ordering -< on Q such that whenever git = 52 for 

26 some gi, 92 ^ Q, qi J^ P and t € Tp, then gi -< 52- Note also that p is the maximal 

27 element of ^. Thus Tp consists of transformations t : Q ^ Q with pt = p, and 

28 q ^ qt for each q G Q — {p}. There are {n — 1)! such transformations (the least 

29 element can be mapped to the other n — 1 elements, the next to n ~ 2 and so 

30 on), concluding the lemma. D 

31 Corollary 1. The syntactic complexity of definite languages is at most n\. 

32 Proof. For an arbitrary automaton A over n states recognizing a definite lan- 

33 guage, ^(A) = UpeQ %^ hence its size is at most n • (n — 1)! = n\. D 

34 5 Conclusion, further directions 

35 The forbidden pattern characterization of generalized definite languages we gave 

36 is not surprising, based on the identities of the pseudovariety of (syntactic) semi- 
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groups corresponding to this variety of languages. Still, using this characteriza- 
tion one can derive efficient algorithms for checking whether a given automaton 
recognizes such a language. Though we could not compute an exact function for 
the syntactic complexity, we still managed to show that these languages are not 
"more complex" than definite languages under this metric. Also, we gave a new 
upper bound for that. 

The exact syntactic complexity of definite languages is still open, as well as 
for other language classes higher in the dot-depth hierarchy - e.g. the locally 
(threshold) testable and the star-free languages. 
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1 Appendix 

2 In the Appendix we give a proof of Lemma [T] and that a regular language L is 

3 definite if and only if its minimal automaton avoids Pd- 

4 We will make use of the following variant of the multicolor Ramsey theorem, 

5 stated here only for monochromatic triangles. 

6 Theorem 5. For any number c > of colors there exists an integer R{c) such 

7 that whenever G is an edge-colored complete graph on at least R{c) vertices that 

8 has at most c colors, then G contains a monochromatic triangle. 

9 The theorem holds for monochromatic arbitrary-sized induced subgraphs as well 

10 but we need only the guaranteed appearance of triangles to show that in a finite 

11 semigroup, a long enough product always has an idempotent factor. 

12 Proof (of Lemma[J\). Let m = R{\G'~'\) and let us define the following complete 

13 graph on [m] with its edges colored by elements of C"-^: let the color of the edge 

14 {i,j), i < j, be the element fij = fifi+i ...fj-i £ C". Applying Theorem [5] 

15 we get that there exists integers l<i<j<k<m with (i, j), (j, k) and (z, k) 

16 having the same color, i.e. fij = fj^k = fi,k, the last being the product of fi^j 

17 and fj^k- Hence, /^^ is an idempotent transformation of G . D 

18 Now for the forbidden pattern characterization of definite languages: 

19 Theorem 6. The following are equivalent for a reduced automaton A = (Q, -£", 5, go, F): 

20 i) L{A) is definite. 

21 a) A avoids Pd. 

22 Hi) For each u G S'^ , u is non-permutational. 

23 iv) A has a unique sink G , all its other components are trivial and for each 

24 we S'^ , u'^\c is non-permutational. 

25 Proof, i)— s>ii). Assume L — L{h) is fc-definite for some fc > 0, and A admits Pd 

26 with px — p and qx — q for distinct states p,q and word x G S^. Since A is 

27 reduced, qoZp — p and qoZq — q for some words Zp, Zq and p, q are distinguishable 

28 by some word w. Then, exactly one of the words ZpX^w and ZqX^w belongs to 

29 L but they share a common suffix of length fc, a contradiction. 

30 ii)— >-iii). Assume u^ is permutational for some u G S'^ . Let Z? C Q, |L)| > 1 be 

31 a set on which u induces a permutation. Then u'^'' induces the identity on Z?, 

32 thus A admits Pd with arbitrary p,q G D and x = u'^''. 

33 iii)— >iv). Obviously A has a sink G. If u* is non-permutational for each u G S^, 

34 then u \c is also non-permutational for each sink G. Hence, u' ' induces a 

35 constant function on G. Assume that there exists another nontrivial component 

36 D ^ G oi A. Then pxo = p for some p £ D and xq G i7+. Thus, Xq induces 
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a permutational transformation on Q (with fixed points p ^ D and the unique 
element of Ca;Q ), a contradiction. 

iv)— 5-1). Analogously to the direction ii)— ^iii) of the proof of Theorem [1] Sup- 
pose the condition of iv) holds. Let n = m.ax{m{\Q\) , \Q\} be the value defined 
in Lemma [1] Let x — yx2 with X2 € Z"", y G S* . It suffices to show that 
QoyX2 = QoX2- Since n > \Q\, both qoyx2 and qoX2 belong to the unique sink C 
of A. By Lemma m X2 can be written as X2 — 2:2,12^2, 22:2, 3 with 0:2,2 inducing 
an idempotent function on C. Since the function induced by 0:2,2 is also non- 
permutational on C, it is a constant function on C, hence X2 induces a constant 
function as well. Thus qayx2 — qoX2 and L{A) is n-definite. D 
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